Effective Language Representations for Danmaku Comment Classification in Nicovideo

نویسندگان

چکیده

Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of information provided by videos. Such an pollutant problem can be solved a comment classifier trained with abstention option, which detects whose video categories are unclear. To improve performance this classification task, paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, Japanese online encyclopedia entities that possibly appear in Nicovideo contents, to pre-train bidirectional encoder representations Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned it could determine whether given falls into any predefined categories. experiments conducted data demonstrated effectiveness compared existing models pre-trained using Wikipedia or tweets. We also evaluated each additional sentiment and obtained results implied applicability feature extractor other social media text.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comment on ‘MeSH-up: effective MeSH text classification for

Information retrieval is an important task that requires specific attention in the biomedical domain where controlled vocabularies are available to characterize and organize textual content. A recent article published in Bioinformatics (Trieschnigg et al., 2009) confirms that there is a continued interest in the community to address this problem and achieve ‘improved document retrieval’. As sho...

متن کامل

Document Classification by Inversion of Distributed Language Representations

The goal of this note is to point out that any distributed representation can be turned into a classifier through inversion via Bayes rule. The approach is simple and modular, in that it will work with any language representation whose training can be formulated as optimizing a probability model. In our application to 2 million sentences from Yelp reviews, we also find that it performs as well ...

متن کامل

Learning Representations for Relation Classification

Knowledge bases can be applied to a wide variety of tasks such as search and question answering, however they are plagued by the problem of incompleteness. In this project, we propose two models for automated relation classification using extracted entity pairs and related sentences from natural text. We evaluate both models on a portion of the Stanford KBP dataset across 38 relations, achievin...

متن کامل

Text Representations for Patent Classification

gives a small, but significant, improvement in classification results on the CLEF-IP 2011 corpus, compared with classification on abstracts only. The effort involved in parsing the descriptions is considerable, however: Because of the long sentences and the dense word use, a parser will have much more difficulty in processing text from the description section than from the abstracts. The titles...

متن کامل

Response to comment on 'MeSH-up: effective MeSH text classification for improved document retrieval'

In response to the methodological considerations, we emphasize that in our paper we compare different MeSH classification systems on two tasks: (i) reproducing manual MeSH recommendations (referred to as indexing by Névéol et al.) and (ii) translating a textual query to an additional MeSH representation (referred to as query expansion). We show that the approach we propose works well on both ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEICE Transactions on Information and Systems

سال: 2023

ISSN: ['0916-8532', '1745-1361']

DOI: https://doi.org/10.1587/transinf.2022dap0010